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1.  Introduction 


This  paper  is  concerned  with  an  algorithm  for  the  log  n  time  computation  of 


the  complete  histogram  of  an  n  X  n  gray-level  array.  It  makes  use  of  a  “fat” 


pyramid  implemented  on  an  SIMD  hypercube  multiprocessor  with  very  high  pro¬ 
cessor  utilization.  These  terms  are  explained  below. 

In  [2]  Tanimoto  describes  an  algorithm  for  histog  .rn  computation  on  a  stan¬ 
dard  pyramid  which  involves  performing  one  log  n  time  "pass"  from  the  bottom 
of  the  pyramid  to  the  top  for  each  gray-level  value  to  be  computed.  The  algo¬ 
rithm  presented  below  for  fat  pyramids  computes  the  complete  histogram  in  a 
single  pass. 

The  algorithm  is  perhaps  more  properly  called  a  hypercube  multiprocessor 
algorithm.  However,  the  concept  of  the  fat  pyramid  provides  a  convenient  way 
of  describing  the  computations  involved. 

2.  SIMD  Hypercube  Multiprocessors 

A  hypercube  multiprocessor  consists  of  2M  (for  some  M)  processing  cells 
interconnected  as  if  each  were  located  at  one  vertex  of  an  M-dimensional  hyper¬ 
cube,  so  that  any  two  cells  share  a  direct  connection  if  and  only  if  their 
corresponding  hypercube  vertices  are  connected  by  a  hypercube  edge.  Further¬ 
more,  if  the  hypercube  has  unit  side,  and  we  assign  to  each  cell  an  address  given 
by  the  M-dimensional  coordinates  of  the  corresponding  vertex,  then  we  can  see 
that  two  cells  will  share  a  direct  connection  if  and  only  if  their  addresses  differ  in 
exactly  one  bit  position. 

/H 


In  a  SIMD  (Single-Instruction  stream,  Multiple-Data  stream)  multiprocessor, 
all  processing  cells  execute  in  unison  a  stream  of  commands  broadcast  from  a  sin¬ 
gle  controller.  The  exception  to  this  is  that  some  subset  of  the  cells  may  ignore  a 
given  instruction,  the  decision  to  do  so  being  based  on  the  contents  of  their  local 
cell  memory. 

3.  Fat  Pyramids 

A  grid  multiprocessor  consists  of  a  rectangular  array  of  nodes,  with  a  pro¬ 
cessing  cell  at  each  node,  and  with  each  node  directly  connected  to  its  four 
nearest  neighbors  (except  on  the  edges  of  the  grid).  In  a  pyramid  multiprocessor, 
there  are  several  levels  of  grids,  each  containing  (say)  one-fourth  as  many  nodes 
as  the  one  below,  and  with  each  node  connected  directly  to  four  nodes  in  the 
level  below.  This  arrangement  allows  for  some  combinations  of  the  information 
in  all  of  the  nodes  at  the  bottom  level  to  be  computed  in  log  n  time,  where  the 
bottom  level  is  (say)  an  n  X  n  square,  by  combining  at  each  node  on  level  1  + 1 
the  information  from  that  node’s  four  children  on  level  /  (level  0  is  the  bottom 
level). 

In  a  “fat”  pyramid,  the  size  of  a  processor  associated  with  a  node  in  the 
pyramid  depends  on  the  level  of  the  pyramid  in  which  it  appears.  Specifically,  a 
processor  at  level  /+1  will  have  four  times  as  much  storage,  and  possibly  four 
times  as  much  processing  power,  as  a  processor  at  level  l.  A  fat  pyramid  allows 
for  combining  operations  in  which  the  amount  of  information  per  node  increases 
with  the  level  of  the  node  in  the  pyramid. 
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4.  Fat  Pyramids  and  Hypercubes 

The  pyramid  algorithm  presented  below  has  the  property  that  only  one  level 
is  “active”  at  any  given  time;  that  is,  while  a  computation  is  being  carried  out  on 
a  given  level,  no  other  level  needs  to  perform  a  computation.  Many  other 
pyramid  algorithms  also  have  this  property.  In  such  cases,  it  does  not  seem 
desirable  to  have  processors  lying  idle  when  the  level  with  which  they  are  associ¬ 
ated  is  not  active.  Suppose  instead  that  some  multiprocessor  system  were  multi¬ 
plexed  among  all  the  levels,  with  all  of  its  processing  cells  being  divided  up 
among  the  nodes  of  a  given  level  when  that  level  is  active.  Thus  if  one  cell  were 
associated  with  each  node  at  level  0  of  the  pyramid,  there  would  be  four  cells 
cooperating  to  carry  out  the  computations  for  each  node  at  level  1,  sixteen  cells 
grouped  to  perform  the  operations  required  by  each  node  at  level  2,  and  in  gen¬ 
eral  A1  cells  associated  with  each  pyramid  node  at  level  l.  I  will  call  such  an 
arrangement  a  “collapsed”  fat  pyramid. 

In  order  for  several  cells  to  cooperate  efficiently  in  performing  some  compu¬ 
tation,  it  would  be  useful  if  there  were  a  high  degree  of  interconnectivity  among 
them.  Just  such  an  arrangement  can  be  nicely  provided  on  a  hypercube.  At  any 
level  l  of  the  pyramid,  all  of  the  cells  of  the  hypercube  are  divided  up  among  the 
nodes  at  that  level,  in  blocks  of  A1  cells  per  node.  It  is  possible  to  do  this  so  that 
the  block  of  cells  implementing  each  node  are  themselves  connected  in  a  hyper¬ 
cube  of  dimension  2/;  thus  none  of  the  A1  cells  in  a  block  are  more  than  21  links 
apart.  Also,  if  we  define  two  processor  blocks  to  be  adjacent  if  any  cell  in  one 
block  is  directly  connected  to  some  cell  in  the  other  block,  then  we  can  also 


arrange  for  all  of  the  blocks  at  any  given  level  to  form  a  rectangular  grid  under 
this  kind  of  adjacency,  which  is  of  course  what  we  desire.  Further  discussion  of 
such  an  arrangement  follows. 

5.  Gray  Codes  and  Grids 

In  [I]  Chan  and  Saad  describe  how  to  embed  a  grid  in  a  hypercube  by  the 
use  of  Gray  Codes,  which  are  number  sequences  in  which  the  binary  representa¬ 
tions  of  two  adjacent  elements  differ  in  exactly  one  bit  position.  The  Gray  Code 
used  in  what  follows  will  be  the  binary  reflected  Gray  Code,  for  which  the  i -bit 
sequence  is  obtained  by  appending  a  reversed  copy  of  the  i -1-bit  sequence  to 
itself  with  each  element  prefixed  by  a  1.  (So,  for  example,  the  2-bit  binary 
reflected  Gray  Code  is  {00,  01,  11,  10},  and  the  3-bit  sequence  is  {000,  001,  011, 
010,  110,  111,  101,  100}.) 

Consider  a  cell  with  grid  coordinates  (x,y),  each  of  which  have  p  bits,  and 
let 


and 


0z — 0z,p-10i,p-2  '  '  0i,O 


0y  0y,p-10yp~2  0y,O 

be  the  xth  and  yth  elements  of  the  p -bit  binary  reflected  Gray  Code,  respectively. 
Then  the  hypercube  address  of  this  cell  is  taken  as  the  concatenation 

0z,p-10r,p-2  0i  O0y  p -1 0y,p  -2  0y,O- 


It  is  not  hard  to  see  that  the  hypercube  addresses  of  two  cells  adjacent  in  the 
grid  will  have  addresses  differing  in  exactly  one  bit  position.  Thus  any  two  such 


cells  will  also  be  adjacent  in  the  hypercube.  This  mapping  is  illustrated  in  Figure 

1. 

6.  A  New  Mapping 

If  we  instead  interleave  the  bits  of  gz  and  gy ,  giving 
9z,p-\Qy,p-\9z,p-2Qy,p-2  ?z,0?y,0> 

then  we  get  not  only  a  mapping  from  a  hypercube  to  a  grid  which  preserves  adja¬ 
cency,  but  we  also  get  a  natural  hypercube-to-collapsed-fat-pyramid  mapping  in 
the  following  sense.  This  mapping  allows  the  hypercube  address  of  any  cell  in 
any  block  at  any  level  l  to  be  regarded  as  the  concatenation  of  two  binary 
numbers,  the  first  one  being  a  2 (L-l)  bit  number  (where  L  is  the  number  of  lev¬ 
els  in  the  pyramid)  determining  which  processor  block  the  cell  is  in,  and  the  other 
being  a  2/  bit  number  giving  a  local  address  of  the  cell  within  its  block.  In  other 
words,  all  of  the  cells  comprising  any  block  on  any  level  l  will  have  hvpercube 
addresses  which  are  all  in  the  range  k A1  to  (fc  +  l)V-l  for  some  k.  This  is  illus¬ 
trated  in  Figures  2  through  4.  (Notice  that  this  property  does  not  hold  for  the 
mapping  depicted  in  Figure  1.) 

Consider,  for  example,  cell  57  in  Figure  3,  which  depicts  a  four-level  col¬ 
lapsed  fat  pyramid  at  level  1.  Now,  57  in  binary  is  111001,  so  this  is  cell  01 
within  processor  block  1110.  Notice  also  that  if  we  apply  in  reverse  the  mapping 
described  above  to  just  the  number  1110,  we  get  (2,3)  (de-interleaving  the  bits  of 
1110  gives  11  and  10,  and  these  are  the  two-bit  binary  reflected  Gray  Code  values 
g2  and  j3,  respectively),  which  is  precisely  the  position  within  the  level  1  grid  of 
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the  block  containing  cell  57. 

The  property  that  a  cell’s  hypercube  address  is  also  simply  the  concatenation 
of  its  block  number  and  local  address  is  not  essential  to  the  algorithm  presented 
below,  but  greatly  simplifies  its  expression. 

7.  A  Sketch  of  the  Histogram  Algorithm 

Given  an  n  X  n  array  of  gray-level  values,  the  problem  is  to  compute  a  his¬ 
togram  of  these  values  over  the  array.  Suppose  that  each  grav-level  value  is 
stored  in  one  cell  at  the  bottom  level  of  a  fat  pyramid  implemented  on  a  hyper¬ 
cube  as  described  above. 

Essentially,  each  block  of  processors  at  a  given  level  will  contain  the  histo¬ 
gram  for  the  gray-level  values  of  the  cells  in  that  block.  Processing  proceeds 
from  one  level  to  the  next  by  combining  the  histograms  of  the  four  child  blocks 
to  make  up  the  histogram  of  the  parent  block.  (Although  this  seems  similar  to  a 
divide- and-conquer  algorithm  on  a  grid,  the  two  schemes  are  different  in  that  the 
cells  constituting  each  block  on  any  level  of  the  collapsed  fat  pyramid  are  inter¬ 
connected  to  form  a  hypercube.  Furthermore,  there  is  greater  interconnectivity 
between  adjacent  blocks  in  the  collapsed  fat  pyramid.) 

Suppose  that  the  gray-level  values  are  s  bits  long  (assume  s  even).  The 
algorithm  consists  of  two  different  phases,  one  for  levels  below  s/2,  and  one  for 
the  higher  levels.  This  is  because  at  levels  below  s/ 2,  the  width  of  a  cell's  local 
address  is  less  than  the  width  of  a  grav-level  value.  The  significance  of  this  fact 
will  become  clear  below. 


v.v. 
*-*-■  -  -- 


Let  the  current  level  be  /  <  s  /2.  Then  the  algorithm  works  as  follows. 
(Assume  in  what  follows  that  k  always  represents  L-l.)  Consider  four  blocks  B0, 
Bv  £2,  an<*  at  level  which  are  to  be  combined  into  a  larger  block  B  at 
level  Z+1.  Consider  a  cell  in  any  5,  with  local  address  c.  Let  its  complete 
address  be  62jk_162i- 2  '  '  '  ^oc2/-ic2i-2  '  '  ’  co-  We  assume  at  this  point  in  the 
algorithm  that  this  cell  contains  the  histogram  information  over  B{  for  all  gray- 
level  values  of  the  form 

9s-\9a-2  '  '  9  21  c2l-\c  21-2  c0> 

that  is,  ending  in  c.  (Efficient  schemes  for  doing  this  are  disussed  below.  It 
turns  out  that  for  a  pyramid  with  220  nodes  on  the  bottom  level,  each  containing 
an  8-bit  gray-level  value,  no  level  of  the  pyramid  requires  more  than  36  bits  per 
cell  for  the  storage  of  all  necessary  histogram  information.)  Once  the  blocks  are 
merged  to  form  B ,  we  will  want  any  cell  with  address 

^  2k  2k-2  ^  2^  21  +1*"  21  ^  21-1  ^  21-2  ^0 

to  contain  the  histogram  information  over  B  for  all  gray-level  values  of  the  form 

9a~\9s-2  '  '  92l+2c2l  +  lc2lc2l-lc2l-2  c 0- 

Consider,  for  example,  a  cell  in  B0 ,  with  local  address  c .  Let  its  complete 
address  be 

6  ofc -\^2k  -2  '  ^  200c  2/_jC  o/_2  '  '  C0- 

This  cell  must  collect  the  histogram  information  over  each  B,  for  all  the  gray- 


level  values  of  the  form 
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! 

9s-i9s-2  '  '  9  21 +2^  c  21 -\c  21 -2  c0- 

Fortunately,  all  of  this  information  is  distributed  across  the  cells  at 

^ 2k-l^ 2k-2  '  '  b2OOc2l-lc2l-2  c0 

t 

j  which  is  the  cell  itself, 

i 

i 

j  ^2k-l^2k-2  '  '  6201c2i_1C2/_2  c0> 

I' 

I 

|  which  is  a  cell  in  block  Bx, 

i 

I 

^ 2k-\^ 2k-2  6 210c 2/_i C 2/_2  '  '  C q, 

which  is  a  cell  in  block  B2,  and 

^2k-l^2k-2  6ollc<2^_iCo/_2  '  '  C q, 

which  is  a  cell  in  block  B3.  This  is  because  no  other  cells  in  B  have  addresses 
ending  in  c ,  and  so  no  other  cells  in  B  can  contain  histogram  information  over  B 
for  gray-level  values  ending  in  c.  But  we  see  by  inspecting  the  addresses  of  the 
four  cells  given  above  that  they  are  connected  in  a  square  by  hypercube  edges. 
So  it  will  take  only  two  parallel  transmission  operations  to  perform  the  necessary 
redistribution  of  the  histogram  information. 

At  level  l  =  s  / 2,  consider  a  cell  in  block  B  with  local  address  c .  This  cell 
will  contain  the  histogram  information  over  B  for  all  grav-Ievel  values  ending  in 
c  .  But  c  is  21  =  s  bits  long.  Thus  the  cell  contains  only  the  histogram  infor¬ 
mation  over  B  for  the  single  gray-level  value  c . 

For  levels  /  >  $/ 2,  the  algorithm  works  as  follows.  Within  a  block  B  at 
some  level  l ,  we  want  any  cell  whose  address  ends  in 


,♦  i  *  »  *  *  ’JiVi  .'l  Jj  '  t  <.<  j  |‘4V* 


i*i  Ia*  i.l'#.*  til  i»i  i»i  iv#>a  ^ 
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9  — 9s-\9s-2  '  9 o 

to  contain  the  histogram  information  over  B  for  the  single  gray-level  value  g .  We 
can  see  that  this  is  true  when  we  reach  level  s  /2.  Now  consider  the  four  level  / 
blocks  B0,  Bv  B2,  and  B3.  Let  the  addresses  of  these  blocks  be 

b 2k -1^ 2k -2  '  '  b200, 

*  boOl, 

b 2k -lb  2k -2  '  6oi0f 

and 

b  2k -lb  2k  -2  b  oil. 

respectively. 

Suppose  that  these  four  blocks  are  to  be  combined  to  form  the  level  /-t-1 
block  B.  We  must  combine  the  histogram  information  from  each  of  those  blocks 
to  form  the  histogram  for  B. 

For  example,  the  histogram  information  over  B0  for  the  grav-level  value 

9  =9s-\9s  -2  9 o 

is  contained  at  any  cell  in  B0  with  local  address  ending  in  g ,  that  is.  with  com¬ 
plete  address 

b  2k  -lb  2k  -2  '  '  b  200c  2l-\C  21-2  '  cs9s-l9s-2  '  0 

for  some  c,  ’s.  But  we  know  also  that  the  histogram  information  over  £?,.  /J  », 
and  B3  for  the  gray-level  value  g  can  be  found  in 

b  2t-l  b  2k  -2  ■  b20l  C  2l~\C  21-2  '  '  c$9s-l9s-2  '  9  o- 
b  2k  -lb  2k  -2  '  ■  b  olOc  2/-1 c  2/ -2  '  '  cs9s-\9.‘<-2  '  9  0’ 
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^2k-\^2k-2  '  ’  '  ^2llc2f-lc2/-2  '  '  cs9s-\9a-2  ?0> 


respectively.  These  four  cells  are  connected  in  a  square  by  hypercube  edges.  So 
in  two  parallel  transmission  operations  the  cell  in  each  block  can  collect  the  histo¬ 
gram  information  from  the  cells  in  the  other  three  blocks,  and  can  compute  the 
histogram  information  over  the  entire  block  B  for  the  gray-level  value  g. 


8.  The  Algorithm  in  More  Detail 

Suppose  that  the  current  level  is  l  <  s/2.  Consider  a  block  B  at  this  level, 
and  a  cell  whose  local  address  within  B  is 


C  — c2l-lc2l-2  co- 


The  histogram  to  be  stored  in  cell  c  will  be  stored  as  a  mapping  from 


9s-\93-2  '921 


to  the  number  of  cells  in  B  containing  the  gray-level  value 


9s-\9i-2  '  '  9  21  c  2l-\c  21-2  c  o- 


That  is,  the  histogram  will  actually  be  stored  as  a  mapping  from  the  first  s-‘2l 
bits  of  a  gray-level  value  to  the  histogram  value  over  B  for  that  grav-level  value, 
with  the  lower  21  bits  of  the  gray-level  value  being  given  implicitly  by  the  local 
address  of  cell  c  within  block  B. 

When  combining  blocks  from  level  /  to  form  larger  blocks  at  level  /  +  1.  the 
histograms  are  combined  in  the  following  way.  Each  cell  first  splits  its  histogram 
mapping  into  two  smaller  mappings  by  splitting  the  mapping's  domain  into  even 
and  odd  subsets.  Then  the  last  bit  is  truncated  from  each  element  in  the 


domains  of  both  new  mappings.  (For  example,  if  the  original  mapping  was 


{(0,™o)>  i)»  (2,m2),  (3 ,m3),  (4,m4),  (5,ms),  (6,m6),  (7 ,m7)},  then  the  two  new 


mappings  would  be  {(0,m0),  (l,m2),  (2 ,m4),  (3,m6)}  and  {(0,m1),  (1  ,m3),  (2,m5). 


(3,m7)}.)  Call  these  the  “even”  and  “odd”  mappings,  respectively. 


Now  every  cell  with  address  of  the  form 


b‘2k-l°2k-2  '  '  b  \®c2t-lc2l-2  c0 


sends  its  odd  mapping  to  the  cell  at  address 


b  2k-l^  2k  -2  b  1  f  ^  2/ -1  ^  21-2  C  q, 


and  every  cell  with  address  of  the  form 


^2k  —  \^2k—2  '  "  b  1 1  C  2/— I  ^  21-^  *  '  ^  o 


sends  its  even  mapping  to  the  cell  at  address 


b  2k-\b  2k-2  ■  b  jOc  2l-\C  21  -2  '  C( 


Each  cell  which  receives  an  even  mapping  adds  (element-wise)  this  mapping 


to  its  own  even  mapping.  Similarly,  each  cell  which  receives  an  odd  mapping 


adds  that  mapping  to  its  own  odd  mapping.  Notice  that  the  histograms  stored  in 


each  cell  now  have  domains  half  the  size  of  what  they  were  previously. 


Each  cell  now  splits  its  histogram  into  even  and  odd  mappings  as  before. 


Then  every  cell  with  address  of  the  form 


b  2k  -\b  2k  -2  0  b  qC  2i- 1 C  0/ _o  Cq 


sends  its  odd  mapping  to  the  cell  at  address 


1  b  qC  21  _i  C  21  -2  Cq, 


>  ^  »  k‘«  w'*  k>  *  • 
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and  every  cell  with  address  of  the  form 

^2k-l^2k-2  1^0 e2l-\c2l-2  c  O' 

sends  its  even  mapping  to  the  cell  at  address 

b  2k  -  \^  2k  -2  0  6  qC  2/  - 1 C  2/  -2  C  O' 

Mappings  are  again  combined  as  described  above.  At  this  point  each  cell 
contains  a  mapping  whose  domain  is  one  fourth  the  size  of  the  domain  of  its  ori¬ 
ginal  mapping.  Furthermore,  a  cell  at  address 

^2k-l^2k-2  '  b  2C  2l-r\c  2tc  21-1  c  0’ 

where 

^—^2k-\b2k-2  '  f>2 

is  the  address  of  the  level  l+l  block  B  containing  the  cell 

c  ~c2i  +  lc2l  c2l-l  c0 

will  now  contain  the  mapping  from 

9s-\93-2  '  '  921+2 

to  the  number  of  cells  in  block  B  containing  gray-level  value 

9s-\9s-2  '  921 +2C  2l  +  lc2l  c  21-1  f0 

For  levels  at  or  above  s/2,  the  algorithm  becomes  a  simpler  special  case  of 
the  algorithm  for  the  lower  levels.  .Assuming  the  current  level  is  /  >  s/2,  we 
combine  histograms  in  the  following  way.  Every  cell  with  address  of  the  form 

^  2k  2k  -2  b  tQc  2i_tC  2i  -2  f0 


sends  its  histogram  value  to  the  cell  at  address 

^  2k-\^  2k  -2  '  ’  ^  l^c  2/-lc  2/-2  c0 

and  vice-versa.  Each  cell  then  adds  its  received  value  to  its  stored  value.  Then 
every  cell  with  address  of  the  form 

^2k-l^2k-2  06  qC  21 C  21-2  0 

sends  its  value  to  cell 

^2k-\^2k-2  ’  '  1^0c2i-lc2f-2  c0 

and  vice-versa,  and  each  cell  then  again  adds  its  received  value  to  its  stored  one, 
completing  the  computation  for  level  /  +  1. 

Once  the  top  level  is  reached,  any  cell  with  address  of  the  form 

c  2L~lc  2L-2  ’  ‘  csQs-\9s- 2  '  '  9q 

will  contain  the  total  number  of  cells  having  gray-level  value 

9s-l9a-2  9o 

9.  Efficiently  Storing  the  Histogram  Mappings 

Suppose  mappings  are  stored  as  sets  of  ordered  pairs  indicating  which  ele¬ 
ments  in  the  domain  map  to  non-zero  values,  and  what  those  values  are.  Then 
we  see  that  the  number  of  ordered  pairs  stored  in  a  given  cell  is  bounded  above 
by  two  quantities  as  follows. 

First,  the  histogram  over  a  block  with  4l  cells  cannot  have  more  than  V 
gray-levels  mapping  to  non-zero  values,  since  there  are  at  most  this  many  distinct 
gray-levels  represented  somewhere  in  the  block.  So  the  number  of  pairs  stored  in 


a  cell  at  level  l  cannot  exceed  4l . 

Secondly,  if  each  cell  only  stores  pairs  for  those  gray-level  values  which  end 
in  the  local  address  of  that  cell  within  its  block,  then  if  the  local  address  of  the 
cell  is  21  bits  long  and  the  gray-level  values  are  s  bits  long,  the  number  of  pairs 
stored  in  a  cell  cannot  exceed  2e~21 . 

Thus  the  maximum  number  of  pairs  stored  in  a  cell  will  be  \ZiF",  that  is,  the 
square  root  of  the  range  of  the  gray-levels,  and  this  maximum  will  occur  when 
/  =  s/4.  However,  we  also  notice  that  the  total  of  the  histogram  values  in  the 
pairs  in  some  cell  at  level  /  cannot  exceed  4l ,  since  the  block  containing  this  cell 
has  only  4*  cells  contributing  to  the  histogram.  We  can  use  this  fact  at  levels 
near  s/4  to  encode  the  mappings  in  cells  at  these  levels  actually  using  fewer  bits 
than  are  required  by  the  ordered-pair  representation,  as  will  be  demonstrated  in 
the  example  below. 

Let  us  consider  the  case  of  a  pyramid  with  220  cells  at  the  bottom  level,  each 
cell  containing  an  eight  bit  gray-level  value.  At  level  0  of  the  pyramid,  each  pro¬ 
cessor  block  contains  a  single  cell.  The  histogram  mapping  for  any  cell  then  is 
given  by  the  single  pair  (^,1),  or  really  just  the  value  g .  requiring  eight  bits. 

At  level  1  of  the  pyramid,  each  processor  block  contains  four  cells.  The  his¬ 
togram  mapping  for  a  block  is  stored  as  four  pieces,  one  in  each  cell.  The  piece 
in  each  cell  can  be  given  as  a  mapping  from  six-bit  gray-level  values  to  three-bit 
counts.  The  mappings  are  from  six-bit  values  because  the  remaining  two  bits  of 
the  gray-level  values  in  the  mappings  in  any  cell  are  the  same  as  the  last  two  bits 
of  the  hypercube  address  of  that  cell.  The  mappings  are  to  three-bit  values 
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because  within  a  block  there  can  be  no  more  than  four  cells  with  the  same  gray- 
level  value.  Also,  no  mapping  piece  can  have  a  domain  with  size  greater  than 
four,  since  there  are  at  most  four  distinct  gray-level  values  in  a  block  of  four 
cells.  Thus  four  pairs  in  each  cell,  each  pair  requiring  nine  bits,  will  hold  the 
mapping  for  the  block.  (Note  that  we  could  actually  hold  the  mapping  with 
merely  one  pair  per  cell.  But  then  we  would  lose  the  relationship  between  the 
gray-level  values  stored  at  a  given  cell  and  the  hypercube  address  of  that  cell, 
which  is  essential  in  obtaining  the  speed  of  the  algorithm.  Furthermore,  we  will 
see  that  36  bits  per  cell  are  enough  to  hold  all  of  the  histogram  information  for 
any  level  of  the  pyramid.) 

We  now  skip  to  level  3.  At  this  level,  each  block  contains  64  cells.  Any  cell 
with  local  address 


C  — c 5C  4  Cq 

will  contain  the  histogram  information  for  all  gray-level  values  of  the  form 

?706C5C4  c0- 

Of  course,  there  are  only  four  gray-level  values  of  this  form.  Also,  within  a  block 
at  this  level,  no  more  than  64  cells  can  have  the  same  gray-level  value.  Thus  the 
mapping  in  cell  c  can  be  given  as  four  ordered  pairs,  with  the  first  element  of 
each  pair  being  a  two-bit  gray-level  value  (the  remaining  six  bits  given  implicitly 
by  c ),  and  the  second  element  of  each  pair  being  a  six-bit  count.  (Actually,  the 
mapping  could  simply  be  stored  as  an  array  of  four  six-bit  numbers,  so  the  first 
element  of  each  pair  is  really  unnecessary.) 
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At  levels  4  and  above,  any  cell  holds  the  histogram  value  for  only  one  gray- 
level  value.  That  gray-level  value  is  given  by  the  last  eight  bits  of  the  cells 
hypercube  address,  so  the  cell  need  only  contain  a  count,  which  can  require  no 
more  than  20  bits. 

The  most  difficult  case  is  that  at  level  2,  where  each  block  contains  16  cells. 
Consider  a  block  B  containing  a  cell  with  local  address 

c  =c3c2c  jC0 

This  cell  must  contain  the  histogram  information  over  B  for  all  gray-level  values 
of  the  form 

t 

?7060S!74c3c2c  1c0 

There  are  16  such  gray-level  values,  and  for  each  of  them  there  is  a  histogram 
value  which  can  be  anywhere  from  0  to  16,  since  it  is  possible  that  all  cells  in  B 
have  the  same  gray-level  value.  However,  we  notice  that  the  sum  of  these  histo¬ 
gram  values  cannot  exceed  16,  since  together  they  represent  part  of  a  histogram 
for  a  block  of  16  cells.  So  we  can  encode  the  mapping  as  a  32-bit  binary  number 
containing  exactly  16  Is,  and  in  which  the  number  of  Os  between  the  nth  and 
n+lst  Is  gives  the  mapping  value  for  n.  The  32  bit  encoding  of  this  mapping 
seems  reasonable,  since  from  a  theoretical  standpoint,  no  encoding  for  the  map¬ 
ping  could  use  fewer  than  29  bits. 

So  we  see  that  in  this  case  the  histogram  information  at  any  level  of  the 
pyramid  can  be  stored  using  no  more  than  36  bits  per  cell. 
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10.  Summary  and  Comments 

This  paper  has  presented  a  log  n  time  algorithm  for  the  computation  of  the 
histogram  of  a  collection  of  values  distributed  one  per  cell  across  a  hypercube 
multiprocessor.  It  uses  the  technique  of  regarding  the  hypercube  as  a  multiplexed 
implementation  of  a  “fat”  pyramid,  in  which  the  processor  at  each  pyramid  node 
is  larger  than  the  processors  at  that  node’s  children. 

The  algorithm  requires  at  most  space  per  cell  where  s  is  the  number  of 
bits  required  to  store  one  of  the  histogram  domain  values,  although  much  less 
storage  may  be  necessary  if  appropriate  encodings  of  intermediate  results  are 
used.  Thus  it  is  most  appropriate  for  use  where  s  is  less  than  2 L ,  that  is,  when 
the  possible  range  of  the  values  is  less  than  the  total  number  of  values.  Machine 
vision  applications  involving  arrays  of  tens  or  hundreds  of  thousands  of  gray-level 
values  quantized  to  a  few  hundred  or  thousand  possible  levels  are  natural  applica¬ 
tions  for  this  algorithm. 

In  some  cases  it  is  desirable  for  the  resulting  histogram  array  to  be  stored 
such  that  values  adjacent  in  the  histogram  array  are  located  in  adjacent  proces¬ 
sors.  Consider  the  case  of  a  two-dimensional  array  of  gray-scale  values,  and 
regard  this  array  as  a  mapping  from  position  to  value.  We  see  that  the  gray¬ 
scale  values  are  the  dependent  values  of  this  mapping.  But  notice  that  their  role 
changes  to  that  of  independent  values  when  considering  the  histogram  itself  as  a 
mapping.  We  were  able  to  have  the  (dependent)  gray-scale  values  of  adjacent 
(independent)  positions  reside  in  adjacent  processors  by  first  transforming  the 
coordinates  for  a  given  position  to  Gray  Code  values,  interleaving  (or  concatenat- 


mg)  these  transformed  coordinates,  and  storing  the  corresponding  gray-scale 
value  in  the  processor  whose  address  is  given  by  the  interleaved  (or  concatenated) 
Gray  Code  values. 

In  the  computed  histogram,  the  gray-scale  values  play  the  role  of  indepen¬ 
dent  values  of  a  mapping  whose  dependent  values  are  indexed  by  processor 
addresses.  This  suggests  by  analogy  how  to  obtain  the  desired  adjacency  of  the 
(dependent)  histogram  values.  Specifically,  consider  what  happens  if  we 
transform  each  gray-scale  value  to  its  corresponding  Gray  Code  value,  before  per¬ 
forming  the  histogram  computation.  Then,  when  the  histogram  computation  is 
complete,  the  histogram  value  mapped  to  by  some  given  gray-scale  value  can  be 
found  by  converting  that  gray-scale  value  to  a  Gray  Code  value,  and  then  access¬ 
ing  a  cell  whose  address  ends  in  the  Gray  Code  value.  Note  that  two  sequential 
gray-scale  values  have  corresponding  Gray  Code  values  differing  in  exactly  one 
bit  position,  and  so  the  histogram  values  mapped  to  by  these  two  gray-scale 
values  will  be  found  in  adjacent  cells. 

This  idea  generalizes  easily  to  histograms  with  multi-dimensional  domains, 
where  it  may  be  desired  that  the  histogram  values  corresponding  to  adjacent 
values  in  the  domain  be  found  in  adjacent  cells.  In  this  case,  each  of  the  ele¬ 
ments  in  the  histogram  domain  tuple  in  each  cell  is  first  converted  to  a  Gray 
Code  value.  Then  these  Gray  Code  tuples  are  interleaved  (or  concatenated),  and 
the  histogram  of  the  resulting  values  is  computed.  Now,  in  order  to  access  the 


histogram  value  for  a  given  domain  tuple,  the  same  transformation  originally 
applied  to  the  domain  tuples  is  applied  to  the  given  tuple,  and  the  corresponding 
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cell  accessed.  We  see  here  that  two  adjacent  domain  tuples,  that  is,  two  tuples 
differing  by  one  in  one  of  their  elements,  will  have  converted  (interleaved  Gray 
Code)  values  differing  in  one  bit  position,  and  so  will  map  to  histogram  values 
which  are  found  in  adjacent  cells. 


References 


1.  T.G.  Chan  and  Y.  Saad,  “Multigrid  Algorithms  on  the  Hypercube  Multipro¬ 
cessor,”  IEEE  Transactions  on  Computers,  vol.  C-35,  no.  11,  pp.  969-977, 
November  1986. 

2.  S.L.  Tanimoto,  “Sorting,  Histogramming,  and  Other  Statistical  Operations 
on  a  Pyramid  Machine,”  in  Multiresolution  Image  Processing  and  Analysis, 


ed.  A.  Rosenfeld,  pp.  136-145,  Springer- Verlag. 


V 


28  20  52 

60  44 

36 

28  21  53 

61  45 

37 

31  23  55 

63  47 

39 

30  22  54 

62  46 

38 

26  18  50 

58  42 

34 

27  19  51 

59  43 

35 

25  17  49 

57  41 

33 

24  16  48 

56  40 

32 

5  6 


Grid  Mapping 


